On the Usefulness of Weight-Based Constraints in Frequent Subgraph Mining
نویسندگان
چکیده
Frequent subgraph mining is an important data-mining technique. In this paper we look at weighted graphs, which are ubiquitous in the real world. The analysis of weights in combination with mining for substructures might yield more precise results. In particular, we study frequent subgraph mining in the presence of weight-based constraints and explain how to integrate them into mining algorithms. While such constraints only yield approximate mining results in most cases, we demonstrate that such results are useful nevertheless and explain this effect. To do so, we both assess the completeness of the approximate result sets, and we carry out application-oriented studies with real-world data-analysis problems: software-defect localization, weighted graph classification and explorative mining in logistics. Our results are that the runtime can improve by a factor of up to 3.5 in defect localization and classification and 7 in explorative mining. At the same time, we obtain an even slightly increased defect-localization precision, stable classification precision and obtain good explorative mining results.
منابع مشابه
Frequent subgraph mining algorithms on weighted graphs
This thesis describes research work undertaken in the field of graph-based knowledge discovery (or graph mining). The objective of the research is to investigate the benefits that the concept of weighted frequent subgraph mining can offer in the context of the graph model based classification. Weighted subgraphs are graphs where some of the vertexes/edges are considered to be more significant t...
متن کاملVisCFSM: Visual, Constraint-Based, Frequent Subgraph Mining
Graphs long have been valued as a pictorial way of representing relationships between entities. Contemporary applications use graphs to model social networks, protein interactions, chemical structures, and a variety of other systems. In many cases, it is useful to detect patterns within graphs. For example, one could be interested in identifying frequently occurring subgraphs, which is known as...
متن کاملSemantically-Guided Clustering of Text Documents via Frequent Subgraphs Discovery
In this paper we introduce and analyze two improvements to GDClust [1], a system for document clustering based on the co-occurrence of frequent subgraphs. GDClust (Graph-Based Document Clustering) works with frequent senses derived from the constraints provided by the natural language rather than working with the co-occurrences of frequent keywords commonly used in the vector space model of doc...
متن کاملThe Gaston Tool for Frequent Subgraph Mining
Given a database of graphs, structure mining algorithms search for all substructures that satisfy constraints such as minimum frequency, minimum confidence, minimum interest and maximum frequency. In order to make frequent subgraph mining more efficient, we propose to search with steps of increasing complexity. We present the GrAph/Sequence/Tree extractiON (Gaston) tool that implements this ide...
متن کاملOO-FSG: An Object-Oriented Approach to Mine Frequent Subgraphs
Frequent subgraph mining (FSG) has always been an important issue in data mining. Several frequent subgraph mining methods have been developed for mining graph data. However, most of these are main memory algorithms in which scalability is a bigger issue. A few algorithms have opted for a relational approach that stores the graph data in relational tables. However, relational databases have the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010